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A theoretical framework for calibration in computer models: parametrization, 
estimation and convergence properties * 
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Abstract. Calibration parameters in deterministic computer experiments are those attributes that cannot be 
measured or available in physical experiments. Kennedy and O’Hagan [18] suggested an approach 
to estimate them by using data from physical experiments and computer simulations. A theoretical 
framework is given which allows us to study the issues of parameter identifiability and estimation. 
We define the L 2 -consistency for calibration as a justification for calibration methods. It is shown 
that a simplified version of the original KO method leads to asymptotically L 2 -inconsistent calibra¬ 
tion. This L 2 -inconsistency can be remedied by modifying the original estimation procedure. A 
novel calibration method, called the L 2 calibration, is proposed and proven to be L 2 -consistent and 
enjoys optimal convergence rate. A numerical example and some mathematical analysis are used to 
illustrate the source of the L 2 -inconsistency problem. 

Key words, computer experiments, uncertainty quantification, Gaussian process, reproducing kernel Hilbert 
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1. Introduction. Because of the advances in complex mathematical models and fast com¬ 
puter codes, experiments on a computer, or referred to as computer experiments in the statis¬ 
tical literature, have become popular in engineering and scientific investigations. Computer 
simulations can be much faster or less costly than running physical experiments. Furthermore, 
physical experiments can be difficult to conduct as in the detonation of explosive materials 
or even infeasible when only rare events like land slide or hurricane are observed. Therefore 
computer simulations can be a stand-alone tool or combined with (typically smaller) data 
from physical experiments or field observations. There are many successful applications of 
computer experiments as reported in the literature. For a review of the general methodologies 
and examples, see the books by [22], and [9], and the November 2009 issue of Technometrics, 
which was devoted to computer experiments. 

In this paper we consider the situations in which both physical experiments/observations 
and computer simulations are conducted and some input variables in the computer code are 
either unknown or unmeasured in the physical experiment. We refer to them as calibration 
parameters. From the responses in physical experiments alone, we cannot estimate the true 
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values of the calibration parameters. We can run the computer codes by choosing selected 
values of the calibration parameters. From the combined data of the two sources, we can make 
inference about the parameters. That is, we can use the physical responses to calibrate the 
computer model. Apart from the calibration parameters, control variables are also involved 
as in standard computer experiments [22]. 

We use a spot welding example from [4] to illustrate the control variables and calibration 
parameters. In resistance spot welding, two sheets of metal are compressed by water-cooled 
copper electrodes under an applied load, L. A direct current of magnitude C is supplied to the 
sheets by two electrodes to create localized heating at the interface (called “faying surface”) 
between the two sheets. The heat produced by the current flow across the faying surface 
leads to melting, and, after cooling, a weld ’’nugget” is formed. The size of nugget is taken as 
the response because it gives a good measure of the strength of the weld. Here L and C are 
considered as control variables. The contact resistance at the faying surface is a calibration 
parameter because it cannot be measured in physical experiments but can be used as an input 
variable to a hnite element code called ANSYS. 

In their pioneering work Kennedy and O’Hagan [18] proposed a model to link the two data 
sources by employing Gaussian process models, which are commonly used in the computer ex¬ 
periments literature. Since its publication, this approach has received a great deal of attention 
in the statistical literature. See [3, 4, 13, 14, 15, 16, 26], among others. It has seen a variety 
of applications, including hydrology, radiological protection, cylinder implosion, spot welding, 
micro-cutting and climate prediction, which were reported in the papers mentioned above and 
also in [11] and [19]. In spite of its importance as a methodology and significant practical 
impact, there has been no theoretical research on its modeling and estimation strategies. The 
main purpose of this paper is to provide a theoretical framework that facilitates the study 
of issues of parametrization, estimation and modeling in the Kennedy-O’Hagan formulation. 
For simplicity, we shall refer to Kennedy-O’Hagan as KO in the rest of the paper. 

The paper is organized as follows. Some basic notation and terminology are given in 
Section 2. A new theoretical framework for the calibration problem and its connection to 
function approximation via Gaussian process modeling are given in Section 3. In particular, 
the lack of identihability of the calibration parameters is discussed and a well-defined notion 
of calibration parameters is proposed by using the L 2 distance projection. The L 2 -consistency 
is dehned as a justification for calibration methods. The KO modeling strategy is discussed 
in Section 3.1. In order to provide a clean and workable mathematical analysis, we consider 
in Section 4.1 some simplihcations of their original formulation. One is to drop the prior, 
which should not affect the general conclusions of our work because the information in the 
prior becomes negligible as the data gets larger. Thus we shall refer to calibration based on 
this simplification as the KO calibration. A key result is Theorem 4.2, which states that the 
likelihood calibration is asymptotically L 2 -inconsistent according to our dehnition of the true 
calibration parameters. A numerical example is given to show that the L 2 -inconsistency can 
have a dramatic effect in small samples and some mathematical analysis is given to shed some 
light on why this happens. See Section 4.1 and 4.3. To rectify the L 2 -inconsistency problem, a 
modihcation of the KO calibration is proposed in Section 5.1 by introducing a scale parameter 
into its correlation function. When the scale parameter converges to -|-oo at a certain rate, L 2 - 
consistency is restored (see Theorem 5.2). The convergence rate of the original (unmodified) 
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KO calibration is given in Theorem 5.3 of Section 5.2. To achieve both L 2 -consistency and 
optimal convergence rate, we introduce a new method called least L 2 distance calibration in 
Section 5.3 and prove such properties in Theorem 5.4 for the case of cheap code, i.e., when 
the computer code can be evaluated with no cost. Its extension to expensive code is given in 
Theorem 6.1 of Section 6. The convergence rate is slower than that in Theorem 5.4 because, 
in expensive code, there is cost in evaluating the code and an unknown function y® associated 
with the code needs to be estimated from data. Concluding remarks are given in Section 7. 
Technical details are given in two appendices. Throughout the paper, mathematical tools and 
results in native space [27] are extensively used. 

2. Preliminaries. For a convex and compact region C let C'(n) be the set of 
continuous functions over Q. For a multiple index a = (ai,..., a^), define [aj =ai + ... + arf. 
Given x = (xi,..., Xd) and function /, we denote the partial derivatives of / by 


D^f 


dM 


For integer fc > 0, define (7^(0) = {/ € (7(12) : 72“/ £ (7(12) for jaj < k). For a function / 
over 12, dehne the L 2 norm as ||/||L 2 (r 2 ) = (Jq the Sobolev norm as 


{2-1) ll/ll«‘(n) = ,/E ll«“/llMn)' 

y 

The Sobolev space 77^(12) consists of functions with finite 77^(12) norm value. The definition 
of the Sobolev spaces can be extended to the case where k is a real number. Such spaces are 
called the fractional Sobolev spaces and we refer to [1] for details. 

Functional approximation methods play an important role in the estimation of the cali¬ 
bration parameters. In this article, we consider the method of kernel interpolation [10]. This 
method provides a good functional approximation when the design T> consists of scattered 
points, i.e., the design points do not have any regular structure. 

Suppose y is a smooth function over 12 and y(xi),..., y(x„) are observed. A kernel inter¬ 
polator y is built as follows. First choose a symmetric positive dehnite function 4>(-,-) over 
17 X 17. Two common choices for 4> are the squared exponential family (also referred to as the 
Gaussian correlation family), with 

(2.2) 4>(s,7;())) = exp{-(/>||s - t||^} 
and the Matern family [24], with 

(2.3) 4>(s, t; V, 4>) = (2v^(?7||s - t\\y (2v^(/>||s - 7||) , 

where Ky is the modified Bessel function of the second kind. Let $ = {^{xi,Xj))ij. Since the 
function <h is positive definite, the matrix $ is also positive definite. Thus the linear system 
about u = (ui,..., Uny 


( 2 . 4 ) 


Y = ^u 
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has a unique solution u = ^ , where Y = {y{xi), ..., y{xn))^■ For x € fl, let 

n 

(2.5) y{x) = y^^Ui^{x,Xi). 

i=l 

It can be verified that y{x) indeed interpolates {xi,y{xi)ys. 

We call the kernel stationary if <h(xi, X 2 ) depends only on the difference xi —X 2 - Another 
special case of the kernel interpolation is the radial basis function interpolation [5], in which 
the kernel function ^{xi,X 2 ) depends only on the distance ||a;i — X 2 II as in (2.2) or (2.3). 
The choice of the kernel function is critical to the performance of the interpolation. Cross- 
validation is a common method for choosing a suitable kernel function, see [22, 20]. 

In computer experiments, Gaussian process models are widely used as surrogate models 
for unknown functions [21]. There is a known relationship between the kernel interpolation 
and the Gaussian process prediction [2]. Suppose z{x) is a Gaussian process on 12 with mean 
0 and covariance function <I>. Then given Z = (z(xi),..., z{xn))'^, the predictive mean of z{x) 
for any x is 

(2.6) E[z{x)\Z] = $(x,x)T$-^Z, 

where <h(x,x) = ($(a;,a;i),..., ^(x, x^))"*". It can be seen that y in (2.5) has the same form 
as the predictive mean in (2.6). For details, see the book [22]. 

3. Calibration Problem. Suppose we have a physical system with a vector of control 
variables x as its input. Denote the input domain of a; by D, which is a convex and compact 
subset of The response of this system given x is denoted as y^{x). We call the physical 
system deterministic if y^{x) is a fixed value for each x G D, and stochastic if y^{x) is random 
for some x. To study the response surface, we conduct experiments on a selected set of points 
{xi,..., Xn}- The set V = {xi,..., x^} is called the experimental design or design for brevity. 

We also have a computer code to simulate the physical system. The input of this computer 
code consists of two types of variables: the control variable x and the calibration variable 6. 
The latter represents inherent attributes of the physical system, which cannot be controlled 
in the physical experiment. Denote the input domain for 9 by 0, which is a compact subset 
of R'^. The computer code gives a deterministic function of x and 6, denoted as y^{x,9). 

Gomputer experiments are usually much less costly than the corresponding physical ex¬ 
periments. In an ideal situation, a computer run only takes a short time so that we can run 
the computer code as many times as we want. Mathematically speaking, we call a computer 
code cheap if the functional form for y® is known. However, computer runs may be time- 
consuming so that we can only evaluate y® on a set of design points. In this case, an estimate 
y®(-) based on the observed y® values (and the corresponding input values) is needed and we 
call the computer code expensive. 

In many cases, the true value of the calibration parameters cannot be measured physi¬ 
cally. For instance, material properties like porosity and permeability are important computer 
inputs in computational material simulations, which cannot be measured directly in physical 
experiments. A standard approach to identify those parameters is to use physical responses to 
adjust the computer outputs. Use of the physical response and computer output to estimate 
the calibration parameters is referred to as calibration. 
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3.1. Kennedy-O’Hagan Method. Kennedy and O’Hagan [18] was the first to propose a 

Bayesian framework for the estimation of the calibration parameters. The original version of 
the Kennedy-O’Hagan method works for stochastic systems with expensive computer codes. 
Denote the physical response by for i = 1,... , n. Kennedy and O’Hagan [18] supposes 

that the physical response follows independent normal distribution. Specifically, they suggests 
that 

( 3 . 1 ) yP{xi) = C{xi) + ei, 

where C{xi) = Ey^{xi) and iV(0, with an unknown ? > 0. Kennedy and O’Hagan 

[18] denotes the “true” value of the calibration parameter by Oq and proposes the following 
model to link ([(•) and y^{-,9o) 

(3.2) C(-) = py^(-,0o) + <^(-), 

where p is an unknown regression coefficient, (i(.) is an unknown discrepancy function. Kennedy 
and O’Hagan [18] claims that 5 is a nonzero function because the computer code is built based 
on certain assumptions or simplifications which do not match the reality exactly. Thus y^ and 
y® are related via the model: 

(3.3) y^{x) = py\x,dQ) + 5{x) + e. 

As is typically done in the literature of computer experiments, they assume that ?/^(., •) and 
(5(.) are independent realizations of Gaussian processes. The use of Gaussian process modeling 
in computer experiment problems can be traced back to [21]. In Gaussian process modeling, 
we usually choose the Gaussian or Matern correlation functions (see (2.2) and (2.3)) and 
regard the parameters like cf) and v as unknown models parameters. Then 9q can be estimated 
from (3.3) through a Bayesian approach. 

3.2. L 2 Distance Projection. The aim of this work is to establish a theoretical framework 

for the calibration problems from a frequentist point of view, i.e., we regard {'■>') 

(3.2) as deterministic functions. For simplicity, we rewrite (3.2) as 

(3.4) C(-) = 2/^(-,0o) + <^(-)- 

This does not make much difference because we can regard the term py^{x, 9) as the computer 
output with calibration parameters {p,9). 

From now on, we suppose the physical system is deterministic, i.e., y^{xi) = ((xi) or 
equivalently = 0. Then under the framework of [18], the calibration problem can be 
formulated as 

(3.5) y^(x) = y^x,9o) + d(x}, 

where 9o is the “true” value of the calibration parameter and S is the discrepancy between y^ 
and y*(-,6»o). 

From a frequentist perspective, 9^ in (3.5) is unidentifiable, because the pair {9o,S{-)) 
cannot be uniquely determined even if y^{-) and y^{-) are known. This identihability issue is 
discussed in [3, 4, 13] and other papers. 
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The main purpose of this section is to provide a rigorous theoretical study on the estimation 
of calibration parameters. Given the lack of identifiability for 9, we need to find a well-defined 
parameter in order to study the estimation problem. A standard approach when the model 
parameters are unidentifiable is to redefine the “true” parameter value as one that minimizes 
the “distance” between the model and the observed data. First define 

(3.6) e{x,e) := yP{x) - y''{x,e). 

Here we adopt the L 2 norm in defining the distance. If another distance measure is chosen, 
the proposed mathematical framework can be pursued in a similar manner. 

Definition 3.1. The L 2 distance projection of 6 is given by 

(3.7) 0* = argmin||e(-, 6 »)||i 2 (n), 

eee 

where e is defined in (3.6). For brevity, we will also refer to 9* as the L 2 projection. 

In Definition 3.1 we find a 9 value that minimizes the L 2 discrepancy between the physical 
and computer observations, because the true value of 9 is not estimable. The value 9* given 
by (3.7) minimizes the average predictive error given by the computer code. This is relevant 
and useful since our interest lies in the prediction of the physical response. One justification 
for choosing the L 2 norm comes from the common use of the quadratic loss criterion. Suppose 
we want to predict the physical response at a new point xq and xq is uniformly distributed 
over n. Then the expected quadratic loss given 9 is 

(3.8) j^{y'P{x)-y%x,9)fdx = \\e{- , 9)\\l^^^^y 

Thus the 9 value minimizing (3.8) is the L 2 distance projection 9*. 

The value of 9* depends on the norm that is used to measure the discrepancy between 
the physical and the computer observations. If the L 2 norm is generalized to an Lp norm or 
some weighted version, the results in the paper are not affected. Detailed discussion on this 
will be deferred to Section 7. In (3.7) we implicitly assumes that the (global) minimizer of 
||e(-, 0)11^2(0) is unique. We believe that this is a reasonable assumption in many calibration 
problems. The definition of the L 2 projection also put the parameter estimation problem 
into an “engineering validation” framework. However, we still keep the wording “calibration” 
because it is the standard terminology since the foundation work by Kennedy and O’Hagan 
[18]. For a related discussion, we refer to [7, 17]. 

Since the functional forms for y^ and y® are unknown, 9* cannot be obtained by solving 
(3.7). For the problems with cheap computer code, we know y* and the function values of y^ 
over a set of design points, denoted as yP{'D). For the problems with expensive computer code, 
we know only y^ifD) and the function values of y® over the design points for the computer 
simulation, denoted as y^{Q). Call 9 a (deterministic) estimator of 9*, \i 9 depends only on 
(V,yP(V),y^) for cheap code or on (V, Q, yP(V), y^(Q)) for expensive code. For fixed y^ and 
y®, let {9n} be a sequence of estimates given by a sequence of designs (given by either {Vn} 
or {{'Dn,Gn)})- Then 9^ is said to be L 2 -consistent if 9n tends to 9* as the designs become 
dense over or x 0). The term “consistent” or “consistency” is a misnomer but we 

keep it here because of its statistical implication. 
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4. Frequentist Properties of the Kennedy-O’Hagan Model. In this section we examine 
the frequentist properties of the calibration model by [18]. Our theoretical analysis shows that 
the method is L 2 -inconsistent. We also construct some examples to show that the Kennedy- 
O’Hagan method may produce unreasonable answers. 

4.1. Simplified KO calibration. In order to conduct a rigorous mathematical analysis, 
we make the following simplifications to the Kennedy-O’Hagan method. We refer to this 
simplified version as the simplified KO calibration, or KO calibration for brevity. 

(i) The computer code is cheap. 

(ii) The physical system is deterministic. 

(iii) Without loss of generality, we can assume p = 1 (, otherwise the unknown p can be 
regarded as a calibration parameter). The discrepancy function d is a realization of a 
Gaussian process with mean 0 and the covariance function (T^<I>, where is an unknown 
parameter and the function is known. 

(iv) Maximum likelihood estimation (MLE) is used to estimate {0,a'^) instead of Bayesian 
analysis. 

The assumption (i) on cheap code will be relaxed for the L 2 calibration in Section 6. 
The assumption (ii) on deterministic physical experiment can be relaxed but will require a 
separate treatment. Further remarks are deferred to the end of Section 7. In assumption (iv), 
we only consider the likelihood portion of the Bayesian formulation in order to have a clean and 
workable mathematical analysis. As will be discussed in Section 7, this simplification should 
not affect the general message one may draw regarding the original Bayesian formulation. Not 
to break the flow, further comments and justifications for the assumptions will be deferred to 
the concluding section. 

Under these assumptions, the functions e(xj,-) are known for i = l,...,n. Then the 
likelihood function given in [18] can be simplified and it can be shown that the log-likelihood 
function for {9, u^) here is given by 

(4.1) l{0,a^-,Y) = -^loga^ - ^log]^] - ^e(x, 0)T$-ie(x, 0), 

where e(x, 9) = (e(xi, 9),..., e{xn, 9))^^, $ = {^{xi,Xj))ij and denotes the determinant of 
j4*j. For details on the likelihood functions of Gaussian process models, we refer to [22, 20]. 

Our study will employ the reproducing kernel Hilbert spaces (also called the native spaces) 
as the mathematical tool [27]. Given a symmetric and positive definite function <h, define the 
linear space 


F^{n) 


N 


/3i4>(-, Xi) : € IN, /3i e R, Xi e H 


. 2=1 


and equip this space with the bilinear form 


N M N M 

2 = 1 j = l 2=1 j = l 
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Define the native space ^^$(0) as the closure of F$(D) under the inner product (•, •)(!). The 
inner product of A/',i)(D), denoted as (•, •)a/' 4 ,(o)) is induced by (•, •)<!>. Define the native norm 

as ||/||Ar^(f 2 ) = /)A/i(C2)' Some required properties of reproducing kernel Hilbert spaces 

are given in Appendix A under Propositions A.1-A.7. Their equations are numbered as (A.l) 
to (A.7). 

Direct calculation shows that the maximum likelihood estimation (MLE) for 6 is 

(4.2) 6ko = argmine(x, 0). 

6»ee 

For hxed 6, let e(-,0) be the kernel interpolator for e{-,9) given by (2.5), i.e., 

(4.3) e(-,0) = 4.(.,x)T$-ie(x,0). 

From the dehnition of the native norm, we have 

l|e(->^)llArs(n) = e(x,0)^$^^$$~^e(x,0) = e(x,0)^$-^e(x,0), 
which, together with (4.2), gives 

(4.4) 9ko = argmin||e(-,0)||^ 

0e0 

Now we study the asymptotic properties for the KO calibration model. To this end, we 
require the design points to become dense over D. This property is measured by the fill 
distance. 

Definition 4.1. For a design T) E D”, define its fill distance as 

(4.5) h(P) := max min llxj — xll. 

xi£T> 


We use 9ko{T^) to denote the estimator 9ko under T>. Theorem 4.2 gives the limiting 
value of 9ko{T^)- 

Theorem 4.2. Suppose there exists vg E L 2 (D), such that 

(4.6) e{x,9) = / ^{x,t)vg{t)dt 

Jn 

for any 0 E 0. Moreover, suppose sup||r0||L2 < +oO; has continuous second order deriva- 

0G0 

tives, and there exists a unique 0' E 0 such that 

(4.7) l|e(-,6'0llA/-s(n) = jnf ||e(-,0)||Ats(n)- 

Then 9Ko{'^n) 9', provided that h{Vn) —>-0 as n ^ oo. 

The condition e(x, 0) = <h(x, t)vg{t)dt implies that e{-, 9) lies in a subset of A7$(D). See 

(A.l) and (A. 2) in Appendix A and [27] for further discussions. 
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4.2. L 2 norm or native norm?. Because in general O' / 9*, according to Definition 3.1, the 
KO calibration is not L 2 -consistent. A bigger issue is whether 9* is an appropriate dehnition 
for the calibration parameters. For example, suppose we adopt 9' in (4.7) as the “true” 
calibration parameters. Then the KO calibration 9ko can be declared consistent according 
to Theorem 4.2. The question is whether O' can be used as a legitimate definition for the 
calibration parameters. Here we remind that Kennedy and O’Hagan [18] states the goal of 
calibration is “... adjusting the unknown parameter until the outputs of the (computer) model 
fit the observed data”. However, we will give an example, backed by mathematical theory, to 
show that the result given by KO calibration may not agree with their original purpose. 

In view of the convergence result in Theorem 4.2, we first study how different the limiting 
value O' of the KO calibration is from 9*. To address this question, we consider the difference 
between the two norms || • 11 ^ 2 ( 0 ) || • || 7 V',g(n). This difference is related to the eigenvalues 

of the integral operator defined as 


(4.8) k(/) = [ ^{■,x)f{x)dx, 

Jn 

for / G ^ 2 ( 12 ). Denote the eigenvalues of k by Ai > A 2 > • • •. Let fi be the eigenfunction 
associated with Aj with ||/i||L 2 (c) = 1- Then it can be shown that 

(^•9) ll/*llAA.(n) = ^/*)L2(n) = \ 

where the first equality follows from (A.2) and the fact that K{\f^fi) = f. It is known in 
functional analysis that k is a compact operator and therefore limfc_^oo Afc = 0 [8]. Thus (4.9) 
yields that WfiWx^^Q^/WfiWl^in) = 00 as 2 -)■ 00 . This leads to 


(4.10) 


sup 

/eAA(f2) 


ll/llAAffi) 

II/IIl2(!^) 


= 00 , 


which implies that there are functions / with arbitrarily small L 2 norm while their A/$ norm 
is bounded away from zero. Therefore, by Definition 3.1, the KO calibration can give results 
that are far from the L 2 distance projection. The following example shows that this effect 
can indeed be dramatic. 

Example 1. Consider a calibration problem with a three-level calibration parameter. Let 
D = [—1,1], ‘h(xi,X 2 ) = exp{ —(xi — X 2 )‘^}. By using a numerical method, we obtain the 
eigenvalue and eigenfunction of k defined in (4.8). The first and second eigenvalues are 
Ai = 1.546 and X 2 = 0.398. For a better visual effect, we use the eigenfunctions whose L 2 
norms are We plot the first and second eigenfunctions of k in Figure 1. We also plot 

sin27rx for later comparison. Suppose we have three different computer codes. Denote the 
discrepancy between the physical response and each of the computer output by ei, 62 and €3 
respectively. Suppose 61 , 62,63 are the three functions given in Figure 1, i.e., 61 and 62 are the 
first and second eigenfunction of k, and 63 is sin27rx. Then which code is the best? From 
lki|lL 2 (r 2 ) = Ikalliaffi) = lk 3 llL 2 (fJ) ~ third computer code is the best according 

to Definition 3.1. 

However, by using a Gaussian process model with the same correlation function <I>, we get 
a different result. By (4.2), maximizing the likelihood function is equivalent to minimizing the 
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Figure 1. Three functions considered in Example 1. The solid and dashed lines are the first and second 
eigenfunction of k with L 2 norm \/20 respectively. The dotted line shows function sin27ra:. 


pivoted sum of square (PSS): where £{ = (ej(xi),..., ej(xn)) ^ for i = 1,2,3. We 

ehoose a spaee-filling design of 11 points, given by Xj = — 1 + (j — l)/5 for j = 1,..., 11. The 
PSSs are 12.594 for i = I, 57.908 for i = 2, and 17978.65 for i = 3. Thus the KO calibration 
will choose the first code because it has the smallest PSS value. This demonstrates that the 
likelihood-based method can give very different rankings of the competing codes from the L 2 
projection. From Figure 1 we can see that |e 3 (x)| is smaller than |ei(x)| and |e 2 (x)| for every 
X, i.e., the point-wise prediction error for the third code is uniformly smaller than the first 
two. Therefore, the KO calibration made a wrong choice. This also gives a good justification 
for choosing the L 2 norm rather than the native norm in Definition 3.1. 

To give a more general explanation for the phenomenon in Example 1, we consider the 
situations where the Matern correlation functions defined by (2.3) are used. Corollary A.6 in 
Appendix A shows that for the Matern correlation functions, the reproducing kernel Hilbert 
space equals to the (fractional) Sobolev space and the two norms are 

equivalent for v >1, where the Sobolev norm is defined by (2.1). Note that the Sobolev norm 
can be big for a function with wild oscillation even when its L 2 norm is small (the same as 
that shown by (4.10)). Thus, the Sobolev norm, which is equivalent to the native norm in 
this context, is not a good measure of discrepancy, because we only care about the magnitude 
of the discrepancy, not its oscillation. Therefore it is not suitable to use the KO calibration 
in most practical problems. For Gaussian correlation function, this problem is even more 
serious because the reproducing kernel Hilbert spaces generated by Gaussian kernels can be 
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embedded into any Sobolev space (which can be shown by Proposition A.4). 

The phenomenon shown in Example 1 can also be interpreted with the help of the 
Karhunen-Loeve expansion of Gaussian processes. Suppose z{x) is a Gaussian process with 
mean 0 and covariance function ^(•,-). Let A* be the eigenvalues of integral operator k in 
(4.8) with Ai > A 2 > • • • and fi be the eigenfunction associated with Aj with ||/i||L 2 [-i,i] = 1- 
Then the Karhunen-Loeve theorem states that z{-) admits the follow expansion 

00 

(4.11) z{x) = ^\iZifi{x), 

i=l 

where Zi’s are independent and identically distributed standard normal random variables, and 
the convergence is in L 2 [—1,1]. The expression (4.11) explains why /i is more likely to be a 
realization of z{-) among other functions with the same L 2 norm, i.e., /i yields the greatest 
likelihood value. To see this, we truncate (4.11) at a sufficiently large i, denoted as K, and 
obtain 

K 

(4.12) z{x) ^ y^^XiZifi{x). 

i=l 

Since {fi} forms an orthogonal basis in L 2 [—1,1], we can approximate any function in L 2 [—1,1] 
by 

K 

fix) « ^(/,/i)L 2 [-i,i]/*(a:)- 
i=l 


Thus the statement “/ is a realization of zi-f approximately yields AjZj = (/,/i)L 2 [-i,i]- 
This event has a probability density proportional to 


(4.13) 


piXiZi = (/,/i)L2[-i,i]) := exp 




because Zi ~ A^(0,1). Thus (4.13) can be regarded as a multiply of the probability density of 
sampling / from z(-) approximately. Now we can find the function with the largest density 
value among a set of / with the same L 2 norm, say ||/||l 2 [-i,i] ~ ^ without loss of generality. 
Let us assume Ai > A 2 , which is true for the covariance function we discussed in Example 1. 
Because 1 = 1 ] ~ /*)i 2 [-i i]> ^be function / maximizes (4.13) should satisfy 

if, /i)l 2 [-i,i] = 1 (/, /i)L 2 [-i,i] = 0 for i = 2, 3,.... Such a function is /i. 

4.3. Numerical Study on Kennedy-O’Hagan Method with Estimated Correlation Func¬ 
tion. To give a more realistic comparison, we extend the study in Example 1 by considering 
the frequentist version of the original Kennedy-O’Hagan method, in which the function is 
estimated as well. Specifically, we suppose that <1> depends on a model parameter 4>, denoted 
as 4>,/,(-, •). Then the log-likelihood function is 


= -^logfT^ - ilog|$,^| - ^e(x,0)T$^^e(x,0), 


(4.14) 


i{e,a\4>-.Y) 
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where = {^^{xi^Xj))ij. By substituting the analytical form of into (4.14), we obtain 
the log-likelihood function with respect to {0,4>): 

(4.15) l{0A-,y) = -^loge(x,6l)'^$^^e(x,6») - ilog|$<^l. 

As in Section 4.1, we estimate (0, (/>) by using the maximum likelihood. In this subsection we 
present some numerical results, which show that even when <I> is estimated, the frequentist 
KO method still suffers from the problem demonstrated in Example 1. 



Figure 2 . Log-likelihood functions for the three functions 61,62,63 given in Example 1 are plotted in solid, 
dashed and dotted lines respectively. 

Example 1 (continued). We use the same true functions ei, 62, 63 and design points as in Ex¬ 
ample 1. We compute the log-likelihood functions given in (4-15) with <k0(xi, X 2 ) = exp{—(/)(xi — 
X 2 )^}. Denote the log-likelihood function in (4-15) by l{6,(j)), where 9 = 1,2,3 correspond to 
the candidate functions 61 , 62 , 63 . The functions l{9, •) are plotted in Figure 2 for 9 = 1,2,3. 
From the figure we can see that sup^gji 2 ^ 3 }^ 0 g[i^ 6 ] ^(^) *?^) = ^(1)1)- Therefore the frequentist 
KO method will pick ei, which gives the same (incorrect) result as in Example 1. 

It can be seen from Figure (2) that the log-likelihood functions 1{1,4>) and l{2,<fi) are 
monotonic decreasing. This suggests that if (f ranges over (0,6], the MLE of (f can be even 
smaller. Unfortunately, we are not able to calculate the likelihood value for a very small cf 
because the correlation matrix becomes nearly singular. Our current numerical experience 
show that likelihood value keeps growing as (p decreases. We conjecture that the likelihood 
function is unbounded in this case, although we are not able to prove it so far. From Figure 
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(2) it can also be seen that if we fix a relatively large cj), say </> > 4, the likelihood values give a 
correct order of the L 2 discrepancy. This is not occasional. In Section 5.1 we will prove that 
such a modified version of the KO model leads to L 2 -consistent estimation. 

5. Asymptotic Results: Cheap Code. Theorem 4.2 is the first asymptotic result we 
present in this article. In this section we shall study other convergence properties, assuming 
that the computer code is cheap as in Section 4.1. 

5.1. Modified KO Calibration. Given the wide-spread use of the Gaussian process mod¬ 
eling in calibration problems (as in the KO model), a fundamental question is whether we 
can modify it to rectify its L 2 -inconsistency problem. For convenience we assume a stationary 
Gaussian process model Y{x). The correlation of Y is given by 

R{x) = CorriY {■ + x),Y (•)), 

where i? is a positive definite kernel over R'^. The Fourier transform [23] is a useful tool 
for studying stationary kernels. We will use the notation A/h(II) instead of A/r(._.)( 0) for 
simplicity. 

Definition 5.1. For f G Li(R'^) define the Fourier transform by 

f{w) := (27r)-'^/2 f 
jR'i 


From Theorem 4.2, it can be seen that the KO calibration is not L 2 -consistent if the 
correlation function R is fixed. In order to construct L 2 -consistent calibration, we should use 
a sequence of R functions indexed by n, denoted by Rn- From (A.2), the || • ||;v's(r 2 ) norm 
becomes || • Wl^ only when $(xi,X 2 ) = S{xi — X 2 ), where 6 denotes the Dirac delta function. 
We need the convergence Rn{x) —)• 6{x) in order to obtain L 2 -consistency. An easy way to 
achieve this convergence is to introduce a scale parameter. Suppose R{-;(f)) is a family of 
correlation functions on R'^ with (f > 0. Call 4> n scale parameter if R{x',(j)) = R{4>x; 1) for 
any (j) > 0 and any x G R'^. Most correlation families like Gaussian or Matern family satisfy 
these conditions. 

Write Rn{x) = R{x](j)n)- Let 9{Rn,'Dn) be the estimate of 0 given by the KO calibration 
using correlation function Rn under design Rn, referred to as the modified KO calibration. 
The L 2 -consistency requires (/>„—>■ 00. But to ensure the convergence of the interpolation, 
4>n cannot diverge too fast. The next theorem suggests that the modihed KO calibration is 
L 2 -consistent if we choose a suitable increasing rate for (/>„. Define the convolution / * f{x) = 
f^d f{x — t)f{t)dt for any / G L 2 (R'^). We list the required regularity conditions before stating 
the theorem. 

Al: supa-gf^ 0 g 0 ||Va;e(x,0)|| < -|-oo, where Vx^{x,9) is the gradient of e{x,9) with respect to 

X. 

A2: There exists (fo>0 such that sup^ge l|e(-, 6 ')llAG(.; 0 o)*fl(.; 0 o)(f^) ^ 

A3: R{-; 1) is integrable and sup^^Q „>]^ R{aw)/R{w) < -|-oo, where R is the fourier transform 
ofR(-;l). 
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Theorem 5.2. Suppose conditions (A1-A3) are satisfied and i?(-; 1) has continuous deriva¬ 
tives. Then 6{Rn,Vn) —)• 0* if fin +oo and finh{T>n) —^ 0. 

Although the modified KO calibration is L 2 -consistent, its implementation relies on a 
prespecified sequence fin- For a calibration problem with a fixed sample size, there is no 
theoretical guidelines on choosing the fi value. A more practical procedure is given in Section 

5.3, namely, a novel calibration method that is L 2 -consistent and does not rely on the choice 
of the kernel function. 

5.2. Convergence Rate. Under the assumption of Theorem 4.2, we can employ (A.6) in 
Proposition A.l to show that the interpolation error given by a kernel with 2k derivatives 
is equivalent to 0{h^^{T>n))- Given this rate, Theorem 5.3 shows that the KO calibration, 
which converges to 9' in Theorem 4.2, reaches the same rate. Let 6 = (0i,..., 6qj^. 

Theorem 5.3. Under the conditions of Theorem f.2, suppose has 2k continuous deriva¬ 
tives. We assume that 9' is an interior point of 0 and there exist a neighborhood U C Q of 
9', and functions DiVe, DijVg G C(U) such that 

de f 

(5.1) —[x,9) = J ^{x,t)DiVe{t)dt, 

(5.2) —^(x,9)= ^{x,t)DijV0{t)dt, 
for X £ Q,9 £ U and 1 < i, j < q. Moreover, 

(5-3) sup {\\DiVe\\L 2 Ti),\\DijVe\\L 2 (n)} <oo, and 

eeu,i<i,j<q 
^2 

(5.4) is invertible. 

Then pKoiVn) - 9'\\ = 0{h^\Vn)). 

The conditions (5.1) and 5.2 enhance the condition (4.6) in Theorem 4.2 by assuming the 
differentiability of e and the interchangeability of a derivative and an integral (differentiation 
under the integral sign; i.e.,Leibniz integral rule). 

Noting that consistency is a necessary requirement for an estimator, we would like to 
find an estimator that is consistent and attains the same convergence rate as in Theorem 

5.3. In the following subsection we find an estimator that guarantees L 2 -consistency and full 
efficiency. 

5.3. Least L 2 Distance Calibration. Let be the kernel interpolator defined in (2.5) for 
under design V. Define the least L 2 distance calibration by 

(5-5) = argmin||yP(-) - y""{-,9)1112(n)- 

eee 

For brevity, we will also refer to it as the L 2 calibration. [13] used the L 2 norm in a different 
context, i.e., choosing an optimal tuning parameter value to minimize the L 2 norm of the 
observed discrepancy y^ — y^. Theorem 5.4 shows that d^fiVn) converges to the L 2 projection 
9* at the optimal rate. 
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Theorem 5 . 4 . Suppose 9* is the unique solution to (3.7) and an interior point of Q; G 
ggQg 7 '||e('; ^*)lli 2 (n) invertible; has 2k continuous derivatives; and there exists a 
neighborhood U G Q of 9* such that y® G T 2 (^ x U) and y^{x, ■) G C‘^{U) for x G 0. Then as 
h{T3>n) 0 , 

(5.6) \\9L,{Vn)-9*\\=0{h’^{Vn)). 

Furthermore, if there exists v G T 2 (^) such that y^{x) = ^(x,t)v(t)dt for all x G then 
the convergence rate can be improved to 

(5.7) \\9L,{Vn)-9*\\=0{h^’^{Vn)). 


By comparing the results and conditions in Theorems 5.3 and 5.4, we can make the 
following observations. First, ||0xo(7^n) — ^^11 and \\9L2{T^n) — ^*11 enjoy the convergence rate 
0{h‘^^{Vn)) under similar conditions. Second, the L 2 calibration has the additional property 
that, even under much less restrictive conditions, it still enjoys convergence, though at a slower 
rate. But this slower rate is optimal under these conditions because the interpolator y'P has 
the same convergence rate given by (A.5). 

6. Least L 2 Distance Calibration for Expensive Code. Now we turn to the case of 
expensive computer code for which y^ cannot be evaluated for infinitely many times. In 
this situation we need another surrogate model for y^. Note that the input space for a 
computer run is x 0 C Let Q be the set of design points for the computer experiment 

with its fill distance h{Q). Suppose 0 is convex. Choose a positive definite function T over 
(n X 0) X (fl X 0). For kernel T and design Q, let y^ be the interpolate for y® defined by 
(2.5). Then we can define the L 2 calibration in a similar way: 

( 6 . 1 ) := argmin||yP(-)-y*(-,6»)||i2(n)- 

eee 

Note that the only difference from the definition in (5.5) is the replacement of y® by its 
interpolate y^ in (6.1). 

We want to study the asymptotic behavior of the L 2 calibration for expensive computer 
code. First we need to extend the definition of 9* in (3.7) to 

9*{g) := argmin lly^’(-) - y"{-,9)\\L2{n)- 


Then we have 

( 6 . 2 ) \\9L2{v,g)-9*\\ < \\9L,{v,g)-9*{g)\\ + \\e%g)-9*\\. 

If we regard the interpolate y^ as the true computer output, 9*{g) can be viewed as an “L 2 
projection”. Following similar steps as in the proof of Theorem 5.4, we can prove that 


( 6 . 3 ) 


\\9L2{Vr,,gn) - 9*{gn)\\ = 0{h\Vn)) 
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under the same conditions in Theorem 5.4. It remains to find a bound for \\0*[Q) — 0*||. The 
following theorem shows that its rate is slower than that in Theorem 5.4. 

Theorem 6.1. Suppose 9* is the unique solution to (3.7); T has 2k' continuous derivatives 
with k' > 3; 9* is an interior point ofQ; yP G L 2 (n); y® G x 0) and ggggr lk('> ^*)lli 2 (o) 

is invertible. Then \\9*{Qn) — = 0{h^ as h{Qn) —^ 0. 

Theorem 6.1, together with (6.2)-(6.3), yields the following result on the convergence rate 
of the L 2 calibration for expensive computer code. 

Theorem 6.2. Under the assumptions of Theorems 5.4 and 6.1, 

\\9L,i'Dn,Gn) - 0*11 = 0{ma^{h’^{Vn),h^'-Hgn))). 


7. Further Discussions and Remarks. This paper provides the first theoretical framework 
for studying modeling and estimation of the calibration parameters in statistical models that 
are motivated by and closely related to the original Kennedy-O’Hagan [18] approach. Being 
the first piece of such work and because of the space limitations, it has left some issues to 
be further considered. The definition of L 2 -consistency in this paper is based on using 9* in 
Definition 3.1 as the “true” calibration parameters. The same mathematical results should 
hold if a different positive definite metric is employed in defining 9*. The technical details 
may be more involved but the same lines of arguments can be used to obtain similar results. 

One may also consider other definitions of 9* by replacing the L 2 norm by alternatives 
such as the Lp norms. The least Lp norm calibration can be defined by straightforward 
modifications of (5.5) and (6.1) and its efficiency can be proved following similar arguments to 
those of Theorem 5.4 and 6.1. There also exist norms which have different physical meanings. 
For example, if the oscillation of the predictor is of interest, a Sobolev norm, which may 
relate to the energy of the system, would be appropriate. The efficient estimation in such a 
framework would require a separate investigation. 

In spite of the L 2 -inconsistency calibration results, the Kennedy-O’Hagan method (which 
gives a calibration estimator converging to 9') can give a good prediction for y^(-). This is 
supported by the upper bound in Proposition A.l. For a function with a smaller norm, 

the kernel interpolate is likely to provide a better approximation because the upper bound 
given in (A.5) is smaller. This implies that it is easier to approximate the function e(-, 0') than 
e{-,9*). For example, the solid and dashed curves in Figure 1 can be more easily estimated 
than the (more fluctuating) dotted curve, although the former two have greater point-wise 
absolute values. 

Next we turn to the discussions on the simplifications (i), (iii) and (iv) of assumptions 
made in Section 4.1. We have relaxed (i) in Section 6 for the L 2 calibration. What about 
a similar extension to the KO calibration for the expensive code? As the situation becomes 
more messy (i.e., the need of estimating the function y®), there are reasons to believe that the 
procedure will remain L 2 -inconsistent but the mathematical details can be more daunting. 
However, until further analysis is done, we are not sure if the KO calibration would converge 
to 9'. Regarding (iii), the hrst part on assuming p = 1 can be easily relaxed to an unknown 
p because p can be included as part of the calibration parameters in the theoretical analysis. 
Its second part on assuming the <I> function is known can also be relaxed but a rigorous 
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analysis will require more work. If we assume that the parameters for the function (such as 
those in (2.2)-(2.3)) lie in a compact set outside zero, our analysis should still be applicable 
because the sequence of estimated values of these parameters should have a nonzero limit 
point. For a related discussion, see [6]. Regarding assumption (iv), we can deal with the 
original Bayesian formulation in the KO paper by bringing back the prior information. Some 
heuristic calculations suggest that the prior for 6 should have no effect on the asymptotic 
results given in Theorem 4.2, provided that 9' lies on the support of the prior distribution. 
Because of the space limitations, such extensions are deferred to future work. 

In theorem 5.3, 5.4, 6.1, convergence rates are proven to be the hll distance to the power 
of some quantity related to the differentiability of the kernel function. This implies that, 
for infinitely differentiable kernel functions, such as the Gaussian covariance kernel (2.2), the 
rate of convergence can be faster than any power function of the fill distance. In fact, by 
applying the error estimate for Gaussian kernels in [27], a parallel development can show that 
an exponential rate of convergence can be achieved by using a Gaussian kernel. 

Our work can be extended in another direction. We assume in (3.5) that the physical 
system is a deterministic function y^{x) for each x. To what extent can the present work be 
extended to a stochastic physical system, where y^{x) is random for some or all x? This ex¬ 
tension can be found in [25]. The new work is a major endeavor because there are three major 
differences. First, while the current work deals with deterministic functions, the stochastic 
work deals with random functions. As a result, the required mathematical tools are quite 
different: the current work employs techniques in native spaces [27], while the latter employs 
the techniques of weak convergence. Finally, the statistical methods and results are differ¬ 
ent: the current work shows the efficiency of the interpolation-based L 2 calibration, while the 
latter proposes a novel method based on smoothing splines in the reproducing kernel Hilbert 
spaces and shows its semiparametic efficiency. Because of these major differences, the present 
work cannot be viewed as a special case of the work in [25]. In the situation where noisy 
presents, it seems that we are going to lose Theorem 4.2 (i.e., the MLE of the KO model 
may not converge), see Example 1 in [25]. In other words, to prove the asymptotic theory for 
the KO model, deterministic physical experiments is a necessary assumption. Although it is 
more realistic to assume that the physical observations are noisy, the current paper gives a 
motivation for studying these new calibration methods rather than the Kennedy-0’Hagan’s 
approach in view of the latter’s L 2 -inconsistency property. 

A method alterative to the L 2 calibration proceeds by minimizing YM=iiVi ~ 
directly. We refer to this method as the ordinary least squares (OLS) method. Under the 
present context, it can be shown that the OLS method is L 2 -consistent if {xj} is a mutually 
independent random sequence uniformly distributed over H or a quasi-Monte Carlo sequence. 
However, the rate of convergence stated in Theorem 5.4 cannot be attained by OLS. For 
the physical experiments with measurement error, a modified version of the L 2 calibration 
is also more efficient than the OLS method, see [25]. Note the in computer experiment and 
uncertainty quantihcation, a fast convergence is highly appreciated because the experiment is 
expensive in general [28]. This provides some justification of using the L 2 calibration over the 
OLS method. 

Appendix A. Reproducing Kernel Hilbert Spaces. 
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We give a brief summary of properties regarding reproducing kernel Hilbert space (RKHS). 
See [27] for details. First, if H is compact and there exists v G such that 

(A.l) f{x)= [ ^{x,t)v{t)dt, 

Jn 

then / G A/$(H) and for any g G 

(A.2) {f,9)Af^(n)= / v{x)g{x)dx. 

Jn 

The existence of v can be guaranteed if / G [12], where * d>(a;) = — 

t)^{t)dt is the convolution. Furthermore, they show that in this situation there exists a 
continuous function v satisfying (H.l) and 

(A-3) \\v\\L2{n) < II/IIa7i>*4.(!^)- 

Wendland [27] discusses the error estimates of the kernel interpolation. First, the following 
equality 

(^■4) WvWM^in) + \\y ~ yW^^in) = WvWM^in)^ 

follows from the projective property of RKHS. Proposition A.l gives the error estimates for 
the interpolation and the native norm. As before, let h{T>) be the fill distance of the design 
points. 

Proposition A.l (Wendland [27], p. 181). Suppose that has 2k continuous derivatives. 
Then there exists a constant such that 

(A.5) sup\D°‘y{x) - D^y{x)\ < C^h'‘-\°‘\{V)\\y\\_^^(^^), 

if y & A/$ and |a[ < k, where y is defined by (2.5); is independent of X and y; and Xi is 
any component ofx. Furthermore, if there exists v G L 2 (H), such that y{x) = ^(x,t)v(t)dt. 

Then the error bounds can be improved as follows: 

(A.6) sup \y{x) - y{x)\ < Cq,h‘^'^{V)\\y\\j^^^^), 

xen 

(A.7) \\y-mN,in)<C^h\V)\\v\\L,in)- 


We now turn to the extension or restriction of native spaces to another region. Assume 
that we have two convex regions Hi C H 2 C R'^ and is a positive definite kernel over H 2 x H 2 . 

Proposition A.2 (Wendland [27], p. 169). Each function f G ^^(Hi) has a natural exten¬ 
sion to a function Ef G A/i(H 2 ). Eurthermore, \\Ef\\j,j-^^Q^) = ||/||A; 4 ,(ni)- 

Proposition A.3 (Wendland [27], p. 170). The restriction /|Hi of any function f G W$(H 2 ) 
is contained inM{9.i) with \\f\^i\\M^{ni)<\\f\\Af.s>{^ 2 )- 

Usually we assume the kernel function has the form <h(x,?/) = R{x — y), where R is 
continuous and integrable over R'^. Denote the Fourier transform of R by R. Since R is 
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symmetric, R is real and R can be recovered from R. Proposition A.4 shows that the RKHS 
A/ij(R'^) can be represented by using Fourier transforms. 

Proposition A.4 (Wendland [27], p. 139). Suppose that R G C'(R‘^)nLi(R‘^) is a real-valued 
positive definite function. Then 

AfRiK'^) = {/ € L2(R‘') n C(R'') : f/y/R G L2(R'')}, 

with the inner product given by 

jRd R(w) 

Under certain conditions, the RKHSs are related to the (fractional) Sobolev spaces. Let 
[a] denote the integer part of a real number a. 

Proposition A.5 (Wendland [27], p. 201). Suppose there exist constants ci,C 2 and t, such 
that the kernel R satisfies 

ci(l + ||t(;|p)“^ < R{w) < C 2 (l + lltclp)”^, 

for tc G R with [r] > d/2. Then AfR{k}) = with equivalent norms. 

Now consider the Matern correlation family Ru, 4 , given by (2.3). Applying Theorem 6.13 
(p. 76) of [27], after some direct calculations we find that the Fourier transformation of this 
family is 

R,^^{w) = 2-^/2(4;.<^2). r(r2^-M/2) ^ ||^||2)-(.+d/2)_ 

Thus as a consequence of Proposition A.5, we obtain the following corollary. 

Corollary A.6 . For [i'-\-d/2] > d/2, the RKHS generated by the Matern correlation function 

(2.3) equals the Sobolev space with equivalent norms. 

Let Rg{x) = R{6x) for 0 > 0. The next result, given by [12], shows that, under certain 
conditions, (A.7) can be expressed in a more direct manner. 

Proposition A.7. Suppose y G (11), and y is the kernel interpolator given by Rg. If 

R has k continuous derivatives, then 

ll!/ - SllAf„(n, < 
where Cr is independent of X, y and 9. 

Appendix B. Technical Proofs. 

In this section we provide the formal proofs for Theorem 4.2, 5.2, 5.3, 5.4, 6.1. 

B.l. Proof of Theorem 4.2. Let be the kernel interpolator for e under design From 

(4.4) and (4.7), it suffices to prove that ||en(', converges to uniformly 

with respect to 0 G 0. From Proposition A.l, we have 

l|en(-,6')||^,j(n) - lk(-,^')ll^.^(n) = \\^n{-,9) - e(-,6»)||^^(f^) 

<Clh\Vr,)\\vg\\l^^^ 

< Clh^{Vn)sup\\vg\\l^f^^), 


(B.l) 






20 


Rui Tuo and C. F. Jeff Wu 


where the first equality follows from the identity (e(-, 0), €„(•, 9) — e(-, ^))jv',j(r2) = 0. The right 
side of (B.l) goes to 0 as n —)• oo and is independent of 9. This completes the proof. □ 

B.2. Proof of Theorem 5.2. Without loss of generality, we assume (po = 1. Let R^{x) = 
R{x;(l)) and Q^p = [(j)‘^R{-;(f>)) * {(p^R{-]cj))). The Fourier transform of Qp, is 

(B.2) 


We first study the relationship between || • ||_Vp^(n) and || • ||a/'q^(o)) for cp > 1. For any / G 

A/’Qi(n), by Proposition A.2, there exists an extension Ef G A/qj(R'^), such that H/llA/bA^) ~ 
\\Ef\\AfQ^{Rd). Then we have 


< II^/IIa/-q^(r^) 


(B.3) 


= (27r)“'^/2 

= (27r)“'^/2 


= (2vr) 


-d/2 


\Ef{wy 


■dw 


Rd Q/,iw) 

\Ef{wr 


-dw 


\Ef{wr 

Rd {2t:YI‘^IE‘{w/( p) 


dw 


(27r) sup{R{(pw)/R{ w)Y f 

w^o Jb 

{2Tr)~‘^^‘^ sup{R{(pw)/R{ w)Y f 
w^o Jb 


\Efyr 

Rd {2ttY/^R?{w 

\Ef{wr 


-dw 


^0 jRd Qi{w) 

2 


-dw 


= sup{R{(pw)/R{w)} \\Ef\\j^ 

< sup {R{aw)/R{w)Y\\f\y ^, 

w^0,a<l ^ 


where the first inequality follows from Proposition A.3; the hrst equality follows from Propo¬ 
sition A.4; the second equality follows from (B.2); the third equality follows from the fact that 
= (p~'^R{w/<p); the second inequality follows from factoring out {R{cpw)/R{w)Y ; the 
fourth equality follows from (B.2); the fifth equality follows from Proposition A.4; and the 
last inequality follows from condition A3 and Proposition A.2. 

Let Co = f^d R(x; l)dx. Define integral operator : L 2 (D) i— ;■ L2{^1) by 

= C'o"V'^ / R{x - y, (p)f{y)dy,x G D, 

Jn 

for any (/> > 0. Obviously Kp, is a self-adjoint operator, i.e., {f,Kfp{g)) = fo'^ 

f,gG L 2 (D). For any x G D, let x — il = {x — xq : xq G D} and (p{x — Q) = {(/>(x —xq) : xq G D}. 
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First we show that for any interior point x in 0, 

lim C'o'V'^ [ R{x - y; (t))dy 

<t>^+oo Jq 

= lim C'o'V'^ / I{y ^ X-n)R{y;(l))dy 

= lim [ I{y £ X — i})R{(j)y, l)dy 

<j>^+oo J^d 

= lim [ I{y £ 4>{x-n))R{y;l)dy 

(p^ + OO Jj^d 

(B.4) = Co"' [ R{y; l)dy = 1, 

JR'* 

where the fourth equality follows from the dominated convergence theorem and the fact that 
X — Vt contains a neighborhood of 0. 

Let Tn = R(x — y \ 4>)dy — 1. Then for any interior point x in fl, 


(B.5) 


sup \K^{e{-,9)){x) 
8 


= sup 
8 

-a 


/ R{x 

Jn 

~^4>^ f R{x — 
Jn 


- e{x,9)\ 

- yA)^{y,d)dy 
y, (j))€{x, 9)dy + rn€{x; 9) 


< sup 
8 


Co"' 


/R'* 


I{y £ (p{x - n))R{y, 1 ) 


{e{x - y/cp, 9) - e(x, 9)}dy + sup |r„e(x, 

8 


< 



I{y £ (p{x 


+\rn\ sup \e{x,9)\, 
8 


^))R{y, ^)\\y\\dy 


where the last inequality follows from the mean value theorem and condition Al. Using the 
dominated convergence theorem, we have fj^dd(y £ (p{x — Q))R{y,l){\\y\\/(p)dy —)• 0, since 
y/(p lies in the bounded set x — VL. This shows that the first term in (B.5) tends to 0. The 
second term in (B.5) also tends to 0 because of (B.4). Thus (B.5) shows that «; 0 (e(-, 0))(x) 
converges to e(a:, 9) uniformly with respect to 9. 

From the definition of native spaces, it is easily seen that 


(B-6) \\f\\\ii{n) — c|l/llAr^K(fi) 

for any /, K and c > 0. Since condition A2 holds, by applying (B.3) to e(-, 9), we see that for 
any <p > \ and 9 £ Q, e(-,^) G Consequently, from (A.1)-(A.3), for any 9 £ Q and 

(p > 1, there exists £ L 2 (U), such that e(x, 9) = R(x — t; (p)v^fi{t)dt. Thus 


e{x,9)= [ {Cq^(P'^)R{x- t-l){Co(p '^)v^fi{t)dt 
Jn 

= K{{CQ(p~'^)v^fi){x). 


(B.7) 
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Applying (A.3) and (-B.3), we have 


(B.8) 


Then 


(B.9) 


\\Cq 4> '^v^^e\\L2{n) < l|e(-,^')llAAQ^(f7) 

< sup {^(au;)/^(u;)}^||e(-, 6 »)||^ <+00. 

w^0,a<l ^ 


Cor‘'lk(-,9)115,,^,„,-|k(-,e)llL,n, 




= (n^iCocp %, 0 ),C'o(/> - e(-, 6 »)') 

\ / L2(^) 

\ / i/ 2 (n) 

= (c'o(/>“%,e,e(-, 6 ') - K^{e{-,0)) 


1,2 (H) 


< \\Co4> "*u^, 0 ||L 2 (o)||e(-, 6 ') - K^(e(-, 6 ')) 11 ^ 2 ( 0 ) 

< ||C'o 0 “%, 0 ||l 2 (q) sup|e(-, 6 ») - «;^(e(-, 6 »))| 

0 e 0 


L 2 ( 0 )’ 


where the hrst equality follows from (A. 2 ); the third equality follows from the definition of 
the fourth equality follows from the self-adjoint property of the fifth equality follows 
from (-B.7); the first inequality follows from Schwarz’s inequality. Note that {B.8) gives the 
uniform upper bound of ||C'o(/>“'^U 0 , 6 »||L 2 (f 2 ) with respect to 9. Using the dominated convergence 
theorem and (B.5), || sup^ge \e{-,9) - K^(e(-, 6*)) | ||l 2 ( 0 ) ^ 0, as 0 oo. 

Now return to the settings of Theorem 5.2. Since (jin —^ +oo as n —k oo, applying (B. 6 ), 
we have 


(B.IO) 


sup 

6 »ee 


Cq ^riR<Pri 


0 , 


as n —>• oo. On the other hand, we have 


(B.ll) 


(B.12) 


sup 

e&e 

= sup 
0ee 


. (O) - ¥n{-,9)fj^ 




'4>71 


m 






< sup ||e(-, 0 ) 11 ^ _2 (f^) 

6 S 0 *^0 ^4"n, 


< [CrCo sup {R{aw)/R{w)} sup||e(-,0)||^ 
^ w^o,a<i ee© 
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where the equality follows from (A.4); the first inequality follows from Proposition A.7; the 
second inequality follows from (B.3); and the limiting relationship follows from conditions A2, 
A3 and the fact that 4>nh{'Dn) —)• 0. One may notice that (B.ll) does not follows immediately 
from Proposition A.7 because in Proposition A.7 y is built using the kernel Ro but here e„(-, 6) 
is built using instead of However, this is not a serious problem because using 

(2.4) and (2.5), and after some simple calculations, we can verify that the two interpolators 
given by R^j,^ and C^^cp^R^^ respectively are equal to each other. Based on this equivalence. 
Proposition A.7 is still applicable. 


Now we can bound 


Co4>n ll^(■’^)llL 2 (^) 


with 


sup 

0 g 0 

= sup 

0 G 0 

< sup 
060 


Co<^rik'n(-,0)lll,,^^(o)-lk(-,0)lli2(f^) 


+ sup 
060 


Cq </>n^ ^ 


1 , (O) - l|e(->^)llL2(n) 

Cq 


where the first equality follows from (B.6); the inequality follows from the triangle inequality; 
and the limiting relationship follows from (B.IO) and (B.12). Therefore, we have established 
the following result 

e{Rn,Vn) = argmin||e„(-,0)||^ ^ argmin ||e(-, 0)||i = 6*, 

060 060 


as n —)• oo. This completes the proof. 


□ 


B.3. Proof of Theorem 5.3. For the proof we need some inequalities. The first one 
follows immediately from (A.7) in Proposition A.l: 


(B.13) 


\in{-,0) - e(-,6')||Ars(n) < Cq>h'"{'Dn)\\vg\\L^(^Qy 


From the definition of in in (4.3), we have ■§^{■,0) = ^(‘j x)'^$"^^(x, 0). Because ||^(•,0) 
is also spanned by the functions {$(-,Xj)}, f|^(-,0) is equal to the kernel interpolator for the 
pairs (xj, f§(xj,0)). As a result of (A.7) in Proposition A.l, we obtain 


(B.14) 

Similarly we have 
(B.15) 


d{in - e) 


de. 


i;0) 




<C^h’^{Vn)\\DiVe\\L,iu)- 




€n - e 


dOiddj 


i;0) 


A/'*(n) 


<C^h’^{Vn)\\DijVe\\L,(n)- 
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As l|fn(‘,is minimized at OKoi'^n), the Taylor expansion of with 

respect to 6 gives 


^?i)llAA(f2) 


d9de^ 



{9Ko{T^n) - ^ 0 ) 


where On is located between O' and OKO^J^n)- By Theorem 4.2, On —)• O'. Thus 


(B.16) 


0Ko{T^n) - O' 


-1 


d 


('i ^n)llAA(f2) ) ^ )IIa/'c 


h!^)’ 


where ggggT l|en('; ^n)ll^^(r 2 ) is invertible because of assumption (5.4) and the fact that On —)• O'. 
Furthermore, 


(B.17) 




dOidOj 


\^n{'^0n) e(-, 




dOi 


dO, 


— ||en(') ^n) e(-, ||AA(f2) 


d‘^{en - e) 


dOidOj 


•i^n) 


+ 


d{en - e) 


dO, 


{■A) 


AA(f^) 


d{en - e) 


dO, 


i-Jn) 


AA(f^) 

1 . 

AA(f 2 ) J 


where the hrst equality follows from (A.4). Invoking (B.13)-(B.15), and the condition h(T>n) —)• 
0 in Theorem 4.2, (B.17) tends to 0. This results in 


(B.18) 


d 


52 
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because On tends to O'. By the definition of O' in (4.7) and the assumption that O' is an interior 
point of 0, = 0. Therefore, we have 


A 


vs(S7) 
a'Ml 2 


d 




= 2{^H^{;e'),en{;0 ')-e{; 0 ') 


< 2 


dOi 

d{en - e) 


(B.19) 


dO, 

-2 1,2k, 




AT(f2) 


< 2C^h"‘'‘{Vn)\\ve'\\L2{n)\\DiVg/\\L^i^Q-) —)• 0 , 


where the first equality follows from (A.4); the last inequality follows from (B.13) and (B.14); 
and the limiting relationship follows from (5.3) and the fact that h{'Dn) —?• 0. Then we obtain 
the desired result by combining (B.16), (B.18) and (B.19). □ 


B.4. Proof of Theorem 5.4. First we prove the L 2 -consistency, i.e., the convergence of 
6 *^ 2 (^n) to 0*. Because Oi^iT^n) minimizes Wvn - y''{■, 0)\\\,^^^^■^ and 0* is the unique minimizer 

of||e(->^)lli 2 (o)> it suffices to prove that ^)|li 2 (Q) converges to ||e(') ^)|li 2 (r 2 ) uniformly 

with respect to 0. Note 

lir-l/^(-, 0 )llL(n)-||e(-, 0 )llL(o) 

= (l|y"-y^(-, 0 )llL 2 (O)-|| 6 (-, 0 )||L 2 (n))- 
{\\y^ - y*(u^)llL2(o) + lk(u^)llL2(f2)) 

< \\y^ - y^{-,0) - e(u^)llL2(n)(lly^ - y*('!^)lli,2(n) + lk(u^)llL2(r2)) 

(B.20) < \\y^ - y^\\L2{n)i\\y^\\L2{n) + ll2/^('! ^)\\L2{n) + Ik(') ^)llL 2 (n))- 

Because h{Vn) 0, it can be seen from (A.5) that sup|yP(x) — y^{x)\ 0. Together with 

the compactness of fl, we have \\y^ — 2 /^||i, 2 (r 2 ) = o(l) and ||y^||i, 2 (r 2 ) = 0(1)- Therefore, the 
uniform convergence is obtained from (B.20), and this leads to the L 2 -consistency of eL2{Vn). 

The convergence rate can be derived by following a similar argument as in Theorem 5.3. 
Note that for any / G ^ 2 ( 11 ), 

(B.21) ||/||l 2 < Violin) sup \f{x)\, 


where Vol{^) is the volume of 0. Because ||en(‘) ^)|li 2 (C 2 ) minimized at 0L^{Vn), 0L2{^n) 
tends to 0* and 0* is an interior point, the Taylor expansion gives 


(B.22) 


d 


0 = YQ\\r-y^{;91\\i2in) 




(A(^n)-r), 


where On is located between 0* and Oi^iJ^n)- Since y^ is independent of 0, it is easy to see 
that 

Q 0 Q 0 T\\y^ ~ y (■’^n)|lL2(f2) OOdO'^^^^^ '^ )llL2(f1)’ 
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which is invertible by the assumption. Therefore the convergence rate is given by 


(B.23) 

(B.24) 


d 




de, 

<2^./V^)C^h\Vn)U\Wn) 


I'-") 


L2{n) 


where the first equality follows from the definition of 9* in (3.7) and the fact that 9* is an 
interior point; the inequality follows from Schwarz’s inequality, (B.21) and (A.5) in Proposition 
A.l. Hence (5.6) is obtained by combining (B.22)-(B.24). 

If there exists v € 7^2(11) such that y^{x) = J^^{x,t)v(t)dt, (A.6) in Proposition A.l can 
be applied to (B.23), which proves (5.7). □ 

B.5. Proof of Theorem 6.1. As is in the proof of Theorem 5.4, the convergence of 9*{Qn) 
to 9* is a direct consequence of (A.5). As ||y^(-) — Vni'■, is minimized at 9*{Qn), the 

Taylor expansion gives 


d 


o = ^iiy"(-)-y^(-,niii,(o) 


+ 


I («*(e„) - «• 


where 9n is located between 9* and 9L^{'Dn)- It follows from (B.21) and (A.5) in Proposition 
A.l with |a| = 0,1, 2 that for all 0 £ 0, 


(B.25) 

(B.26) 




d{y" - Vn) 


d9i 

d\y^-yi) 


d9id9j 


{■,0) 


i;0) 


L^in) 


L^in) 


< -\GnW\W^inxe), 


for 1 < i,j < q, which implies 


d9d9^ 


yn-) 


yn(->^n)|| 
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L2{Q) 




0989^ 


\\yn-)-y^(;0* 


iLfo)’ 
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which is invertible by the assumption. Now as in Theorems 5.3 and 5.4, we have 

* ! L2{Q) 


+2 e-),^(.,e-) 


L2(n) 


(B.27) 


* / L2{fl) 

/ L2(0) 


Because h{Qn) —)• 0, (B.25) and (B.26) implies that for sufficiently large n, 


(B.28) 


_Mil( q*\ 


< 2 


i2(n) 


dv^ 


L2(f^) 


By applying Schwarz’s inequality to (B.27) and using the bounds in (B.25), (B.26), and (B.28), 
we obtain ^||y^(-) — Vni'i ^*)\\‘L 2 {n) ~ 0{h^'~^{Qn))- This leads to the desired result. □ 


REFERENCES 

[1] Robert A Adams and John JF Fournier, Sobolev Spaces, vol. 140, Access Online via Elsevier, 2003. 

[2] K. Anjyo and J. Lewis, RBF interpolation and gaussian process regression through an RKHS formula¬ 

tion, Journal of Mathematics for Industry, 3 (2011), pp. 63-71. 

[3] M.J Bayarri, J.O Berger, J. Cafeo, G. Garcia-Donato, F. Liu, J. Palomo, R.J Parthasarathy, 

R. Paulo, J. Sacks, and D. Walsh, Computer model validation with functional output. The Annals 
of Statistics, 35 (2007), pp. 1874-1906. 

[4] M.J. Bayarri, J.O. Berger, R. Paulo, J. Sacks, J.A. Gafeo, J. Gavendish, C.H. Lin, and J. Tu, 

A framework for validation of computer models. Technometrics, 49 (2007), pp. 138-154. 

[5] M.D. Buhmann, Radial Basis Functions: Theory and Implementations, vol. 12, Gambridge University 

Press, 2003. 

[6] A. D. Bull, Convergence rates of efficient global optimization algorithms. The Journal of Machine Learn¬ 

ing Research, 12 (2011), pp. 2879-2904. 

[7] Ghia-Jung Chang and V Roshan Joseph, Model calibration through minimal adjustments. Techno¬ 

metrics, 56 (2014), pp. 474-482. 

[8] J.B. Conway, A Course in Functional Analysis, vol. 96, Springer, 1990. 

[9] K.T Fang, R. Li, and A. Sudjianto, Design and Modeling for Computer Experiments, vol. 6, Chapman 

& Hall/CRC, 2005. 

[10] G.E. Fasshauer, Positive definite kernels: Past, present and future. Dolomite Research Notes on Ap¬ 

proximation, 4 (2011), pp. 21-63. 

[11] M. Goldstein and J. Rougier, Probabilistic formulations for transferring inferences from mathematical 

models to physical systems, SIAM Journal on Scientific Computing, 26 (2004), pp. 467-487. 

[12] B. Haaland and P.Z.G. Qian, Accurate emulators for large-scale computer experiments. The Annals 

of Statistics, 39 (2011), pp. 2974-3002. 












28 


Rui Tuo and C. F. Jeff Wu 


[13] G. Han, T.J. Santner, and J.J. Rawlinson, Simultaneous determination of tuning and calibration 

parameters for computer experiments, Technometrics, 51 (2009), pp. 464-474. 

[14] D. Higdon, J. Gattiker, B. Williams, and M. Rightley, Computer model calibration using high¬ 

dimensional output, Journal of the American Statistical Association, 103 (2008), pp. 570-583. 

[15] D. Higdon, M. Kennedy, J.C. Cavendish, J.A. Cafeo, and R.D. Ryne, Combining field data and 

computer simulations for calibration and prediction, SIAM Journal of Scientific Computing, 26 (2004), 
pp. 448-466. 

[16] V.R. Joseph and S.N. Melkote, Statistical adjustments to engineering models. Journal of Quality 

Technology, 41 (2009), pp. 362-375. 

[17] V Roshan Joseph and Huan Yan, Engineering-driven statistical adjustment and calibration, Techno¬ 

metrics, (2014). 

[18] M.C. Kennedy and A. O’Hagan, Bayesian calibration of computer models (with discussion), Journal 

of the Royal Statistical Society: Series B, 63 (2001), pp. 425-464. 

[19] J.M. Murphy, B.B.B. Booth, M. Collins, G.R. Harris, D.M.H. Sexton, and M.J. Webb, A 

methodology for probabilistic predictions of regional climate change from perturbed physics ensembles, 
Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 
365 (2007), pp. 1993-2028. 

[20] C.E. Rasmussen and C.K.I Williams, Gaussian Processes for Machine Learning, The MIT Press, 2006. 

[21] J. Sacks, W.J. Welch, T.J. Mitchell, and H.P. Wynn, Design and analysis of computer experiments. 

Statistical Science, 4 (1989), pp. 409-423. 

[22] T.J. Santner, B.J. Williams, and W. Notz, The Design and Analysis of Computer Experiments, 

Springer Verlag, 2003. 

[23] E.M. Stein and G.L. Weiss, Introduction to Fourier Analysis on Euclidean Spaces, vol. 32, Princeton 

University Press, 1971. 

[24] M.L. Stein, Interpolation of Spatial Data: Some Theory for Kriging, Springer Verlag, 1999. 

[25] Rui Tuo and C. F. Jeff Wu, Efficient calibration for imperfect computer models., tech, report, Chinese 

Academy of Sciences and Georgia Institute of Technology, 2014. 

[26] S. Wang, W. Chen, and K.-L. Tsui, Bayesian validation of computer models, Technometrics, 51 (2009), 

pp. 439-451. 

[27] H. Wendland, Scattered Data Approximation, Cambridge University Press, 2005. 

[28] D. Xiu, Numerical Methods for Stochastic Computations: a spectral method approach, Princeton Univer¬ 

sity Press, 2010. 



